A Pipeline to Automate the Updating of a Specialized Protein Database
نویسندگان
چکیده
Motivation: The growing number of specialized databases in molecular biology, coupled with the huge increase in the availability of molecular data, necessitates the development of automatic methods for finding and adding relevant information to these databases. Results: We show how a general protein database (Swiss-Prot) can be used as a source of data for a more specialized one (TCDB, the Transport Classification Database). First, we present a maximumentropy classification method trained on preprocessed Swiss-Prot records that achieves high precision and recall in determining which records are relevant to transmembrane transport in cross-validation experiments. Next, we describe a set of rules that can be used to further filter out proteins that are not novel, or not well characterized. Using both these pipeline stages, a human expert only has to examine about 2% of Swiss-Prot records for potential inclusion in
منابع مشابه
Finding Transport Proteins in a General Protein Database
The number of specialized databases in molecular biology is growing fast, as is the availability of molecular data. These trends necessitate the development of automatic methods for finding relevant information to include in specialized databases. We show how to use a comprehensive database (SwissProt) as a source of new entries for a specialized database (TCDB, the Transport Classification Dat...
متن کاملEvaluation of Updating Methods in Building Blocks Dataset
With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...
متن کاملiProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations
PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...
متن کاملطراحی سامانه هوشمند ساخت هستان نگار به کمک شبکه عصبی ARTو روشC-value
In recent years, many efforts have been done to design ontology learning methods and automate ontology construction process. The ontology construction process is a time-consuming and costly procedure for almost all domains/applications, so automating this process is a solution to overcome the knowledge acquisition bottleneck in information systems and reduce the construction cost. In this artic...
متن کاملApplication of Fuzzy Fault Tree Analysis on Oil and Gas Offshore Pipelines
Fault Tree Analysis (FTA) as a Probabilistic Risk Assessment (PRA) method is used to identify basic causes leading to an undesired event, to represent logical relation of these basic causes in leading to the event, and finally to calculate the probability of occurrence of this event. To conduct a quantitative FTA, one needs a fault tree along with failure data of the Basic Events (BEs). Someti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007